SVitchboard 1: Small Vocabulary Ta

نویسندگان

  • Simon King
  • Chris Bartels
  • Jeff Bilmes
چکیده

We present a conversational telephone speech data set designed to support research on novel acoustic models. Small vocabulary tasks from 10 words up to 500 words are defined using subsets of the Switchboard-1 corpus; each task has a completely closed vocabulary (an OOV rate of 0%). We justify the need for these tasks, describe the algorithm for selecting them from a large corpus, give a statistical analysis of the data and present baseline whole-word hidden Markov model recognition results. The goal of the paper is to define a common data set and to encourage other researchers to use it.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SVitchboard 1: Small Vocabulary Tasks from Switchboard 1

We present a conversational telephone speech data set designed to support research on novel acoustic models. Small vocabulary tasks from 10 words up to 500 words are defined using subsets of the Switchboard-1 corpus; each task has a completely closed vocabulary (an OOV rate of 0%). We justify the need for these tasks, describe the algorithm for selecting them from a large corpus, give a statist...

متن کامل

SVitchboard II and fiSVer i: high-quality limited-complexity corpora of conversational English speech

In this paper, we introduce a set of benchmark corpora of conversational English speech derived from the Switchboard-I and Fisher datasets. Traditional ASR research requires considerable computational resources and has slow experimental turnaround times. Our goal is to introduce these new datasets to researchers in the ASR and machine learning communities (especially in academia), in order to f...

متن کامل

Expert Systems for Document Retrieval : Problems in Capturing Synonym Relations from the Experts '

A key problem in designing information retrieval systems is making the information contained in the system available quickly and easily, with a minimum of training or preparation required by the user. New lisers may be experts in their fields and have no problems with the mechanics of using the computer; yet they may still have diflicu1ty in retrieving information because they are not familiar ...

متن کامل

On Two Extensions of Abstract Categorial Grammars

Categorial Grammar • Type-theoretic grammar formalism for describing natural languages • Based on the implicative fragment of linear logic • Resource sensitivity • Simple but enough expressive • Mildly context-sensitive languages are generated by second-order ACGs (de Groote and Pogodalla 2004) On Two Extensions of ACGs – p.3/45 Abstract Categorial Grammar • Type-theoretic grammar formalism for...

متن کامل

Applications of virtual-evidence based speech recognizer training

We present two applications of our previously proposed virtualevidence (VE) based speech recognizer training algorithm [1, 2]. The first relates to two-pass training where segmentations obtained during the first pass are used as VE to train the subsequent pass. We use the TIMIT phone and SVitchboard continuous speech recognition tasks to demonstrate the benefits of using VE based training in tw...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005